1. Introduction

The Address Locator Tool is an online tool built for the Department of Housing and Community Development (DHCD). It uses information that comes from different sources and in different formats. This file documents all those sources and the intermediate that steps required to make the data ready to use it as input.

This file is for internal use and is not intended to be distributed with anyone outside the organization.

2. Methodology

This section is organized in the following way: each source of information has its own section and has the code used to create the final dataset, that can be downloaded as a .csv or .pdf. Each section also has an R script that has more information about each data source.

2.1. GreatSchools.org data

Greatschools.org provides information about school quality and several other social and demographic indicators. The Address Locator Tool uses its data. To collect this data, we are going to use a “scraper” written in Python.

The final dataset has information for 5457 schools all across the state. You can download the data here:

Since our data has the latitude and the longitude of each school, the following map shows the location of it in the state.

2.2. diversitydatakids.org - Kirwan Institute Child Opportunity Index

The diversitydatakids.org - Kirwan Institute Child Opportunity Index is a comprehensive sociodemographic indicator that will be used as a proxy of the socio-demographic status of each census tract. The Child Opportunity Index combines 19 separate component indicators into a single metric: Very Low, Low, Moderate, High or Very High. For more technical information, click here.

The COI Index is calculated for 1368 census tracts, divided into four different metropolitan areas. In order to get the information for each one of the metropolitan areas, you will need to download the data from the following links:

The final dataset is the following:

2.3. The Affirmatively Furthering Fair Housing (AFFH) Data

From the (Affirmatively Furthering Fair Housing)[https://www.hudexchange.info/resource/4868/affh-raw-data/] data we are going to use the following five indicators:

Please note that values are percentile ranked and range from 0 to 100. The higher the score, the better. For more technical information, please reference here.

The final dataset is the following:

2.4. MassDOT data for buses and bus stops data

The data has been retrieve from the following links:

## Reading layer `Mass_buses' from data source `/Users/lauti/Google Drive/DHCD-tool/2019 Data/shapefiles/Mass_buses' using driver `ESRI Shapefile'
## Simple feature collection with 2368 features and 22 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: -73.37251 ymin: 41.24406 xmax: -69.95501 ymax: 42.91335
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `Mass_buses_stops' from data source `/Users/lauti/Google Drive/DHCD-tool/2019 Data/shapefiles/Mass_buses' using driver `ESRI Shapefile'
## Simple feature collection with 15992 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -73.372 ymin: 41.24403 xmax: -69.96195 ymax: 42.91025
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs